1 Introduction

This is a brief exercise comparing the results from four forecast models - two statistical methods and two machine learning models. I’ve chosen to focus on Indian Gross Domestic Product (GDP) growth at constant market prices. The dataset contains quarterly data between 2000-01 Q4 to 2018-19 Q4 sourced from RBI-DBIE. The NAS 1999-00, 2004-05, 2011-12 have been appropriately spliced.

I’ve carried out a univariate analysis. First the raw data is read and YoY growth is computed.

raw <- read.csv("WorkbookFinal.csv")$GDP
raw <- c(rep(0,4),diff(raw,lag=4))/c(rep(0,4),raw[-(length(raw)-3):-length(raw)])*100
raw <- raw[-1:-4]

GDP <- ts(raw,frequency=4,start=c(2002,1))

The graph below is interactive (move pointer over observations to see exact growth values)

2 Forecast Models

The statistical models under consideration are the naive forecast, random walk model (naive with drift) and the best-fit ARIMA model as selected by auto.arima function from the forecast package in R.

The machine learning models are the k-nearest neighbors (KNN) regression, a widely used benchmark classification algorithm and a neural network autoregression model with a single hidden layer and lagged inputs.

I intend to employ the multi-step iterative forecast method where the training data is updated with each iteration to generate real-time forecasts and focus on forecasts upto 8 quarters ahead.

Appropriate loop structures are coded to generate forecasts for the final 15 in-sample observations and thereby measure forecast accuracy, both relative and absolute.

Select Code to view the coding process

Models

  1. Naive
  2. Random Walk (with Drift)
  3. KNN Regression
  4. ARIMA
  5. Neural Network Autoregression

Code

knn <- list(); nne <- list(); tmp <- numeric(8)
RMSE <- data.frame(matrix(ncol=5,nrow=8,
                          dimnames=list(NULL, c("Naive","RandomWalk","KNN", "ARIMA", "NNAR"))))

hor <- 8

for (hor in 1:8)
  {
    for (i in 56:70)
      { pred <- knn_forecasting(GDP[-i:-(length(GDP)+1)], 
                                h = hor, lags = 1:12, k = 4, msas = "recursive")
      ro <- rolling_origin(pred, h = hor)
      knn[[i-55]] <- ro$errors
      
      fit <- nnetar(GDP[-i:-(length(GDP)+1)], size=4)
      fcast <- forecast(fit, h=hor)
      for (k in 1:8)
      tmp[k] <- GDP[i-2+k]-fcast$mean[k]
      nne[[i-55]] <- tmp
    }
  
  e1 <- tsCV(GDP, rwf, drift=FALSE, h=8,window=54)
  RMSE[hor,1] <- sqrt(mean(e1[,hor]^2, na.rm=TRUE))
  
  e2 <- tsCV(GDP, rwf, drift=TRUE, h=8,window=54)
  RMSE[hor,2] <- sqrt(mean(e2[,hor]^2, na.rm=TRUE))
  
  kerr <- list()
  for (k in 1:15) kerr[k]<-as.numeric(knn[[k]][1,][hor])
  kerr <-as.numeric(kerr)
  RMSE[hor,3] <- sqrt(mean(kerr^2, na.rm = TRUE))
  
  far2 <- function(x, h){forecast(arima(x, order=c(1,0,0)), h=h)}
  e3 <- tsCV(GDP, far2, h=8)
  RMSE[hor,4] <- sqrt(mean(e3[,hor]^2, na.rm=TRUE))
  
  nerr <- list()
  for (k in 1:15) nerr[k]<-as.numeric(nne[[k]][hor])
  nerr <-as.numeric(nerr)
  RMSE[hor,5] <- sqrt(mean(nerr^2, na.rm = TRUE))

}

relRMSE <- RMSE
for(i in 1:8)
relRMSE[i,] <- RMSE[i,]/RMSE[i,1]

3 Forecast Results

Please choose the desired forecast horizon from the tabs below

1-Qtr Ahead

2-Qtr Ahead

4-Qtr Ahead

8-Qtr Ahead

4 Relative Model Performance

Finally, I examine the relative model performance by evaluating the Relative Mean Square Error across the forecasted 8 horizons with the naive forecast serving as benchmark. It is fairly clear that the naive model is difficult to outperform despite the relative complexity of its competitors.

Nonetheless, the neural network autoregression model outperforms all others including the naive forecast at the first horizon forecast. Furthermore, the KNN regression model provided better forecasts for horizons between the second and sixth quarter-ahead forecast. These results are needless to say, subject to the lag structures and other parameters passed to the classification models which can be further optimised. Hence the need for supervised learning approaches taking into account model characteristics as well as the data generating process.

Thank You